Filling in NAs with last non-NA value

Problem

You want to replace NA's in a vector or factor with the last non-NA value.

Solution

This code shows how to fill gaps in a vector. If you need to do this repeatedly, see the function below. The function also can fill in leading NA's with the first good value and handle factors properly.

# Sample data
x <- c(NA,NA, "A","A", "B","B","B", NA,NA, "C", NA,NA,NA, "A","A","B", NA,NA)
# NA  NA  "A" "A" "B" "B" "B" NA  NA  "C" NA  NA  NA  "A" "A" "B" NA  NA 

goodIdx <- !is.na(x)
# FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE

# These are the non-NA values from x only
# Add a leading NA for later use when we index into this vector
goodVals <- c(NA, x[goodIdx])
# NA  "A" "A" "B" "B" "B" "C" "A" "A" "B"

# Fill the indices of the output vector with the indices pulled from
# these offsets of goodVals. Add 1 to avoid indexing to zero.
fillIdx <- cumsum(goodIdx)+1
# 1  1  2  3  4  5  6  6  6  7  7  7  7  8  9 10 10 10

# The original vector with gaps filled
goodVals[fillIdx]
# NA  NA  "A" "A" "B" "B" "B" "B" "B" "C" "C" "C" "C" "A" "A" "B" "B" "B"

A function for filling gaps

This function does the same as the code above. It can also fill leading NA's with the first good value, and handle factors properly.

fillNAgaps <- function (x, firstBack=FALSE) {
    ## NA's in a vector or factor are replaced with last non-NA values
    ## If firstBack is TRUE, it will fill in leading NA's with the first
    ## non-NA value. If FALSE, it will not change leading NA's.

    # If it's a factor, store the level labels and convert to integer
    if (is.factor(x)) {
        lvls <- levels(x)
        x    <- as.integer(x)
    }

    goodIdx <- !is.na(x)

    # These are the non-NA values from x only
    # Add a leading NA or take the first good value, depending on firstBack   
    if (firstBack)   goodVals <- c(x[goodIdx][1], x[goodIdx])
    else             goodVals <- c(NA,            x[goodIdx])

    # Fill the indices of the output vector with the indices pulled from
    # these offsets of goodVals. Add 1 to avoid indexing to zero.
    fillIdx <- cumsum(goodIdx)+1

    x <- goodVals[fillIdx]

    # If it was originally a factor, convert it back
    if (exists("lvls"))   x <- factor(x, levels=1:length(lvls), labels=lvls)

    x
}

# Sample data
x <- c(NA,NA, "A","A", "B","B","B", NA,NA, "C", NA,NA,NA, "A","A","B", NA,NA)
# NA  NA  "A" "A" "B" "B" "B" NA  NA  "C" NA  NA  NA  "A" "A" "B" NA  NA 
fillNAgaps(x)
# NA  NA  "A" "A" "B" "B" "B" "B" "B" "C" "C" "C" "C" "A" "A" "B" "B" "B"

# Fill the leading NA's with the first good value
fillNAgaps(x, firstBack=TRUE)
# "A" "A" "A" "A" "B" "B" "B" "B" "B" "C" "C" "C" "C" "A" "A" "B" "B" "B"

# It also works on factors
y <- factor(x)
# <NA> <NA> A    A    B    B    B    <NA> <NA> C    <NA> <NA> <NA> A    A    B    <NA> <NA>
# Levels: A B C
fillNAgaps(y)
# <NA> <NA> A    A    B    B    B    B    B    C    C    C    C    A    A    B    B    B   
# Levels: A B C

Notes

Adapted from na.locf() in the zoo library.